IBM Research's SmolDocling, a 256M-parameter vision-language model, delivers fast document OCR and multimodal processing at 0.35s per page on consumer GPUs, handling text, formulas, code and charts efficiently.
Ivy-VL: 3B lightweight vision-language model outperforms 7B models, enables real-time AI glasses, ranks #1 on OpenCompass under 4B. Open-source edge AI solution by AI Safeguard, CMU & Stanford.